Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 125
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Phys Chem A ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38717302

RESUMO

Atomic partial charges are crucial parameters in molecular dynamics simulation, dictating the electrostatic contributions to intermolecular energies and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of ab initio semiempirical quantum chemical methods such as AM1-BCC and is expensive for large systems or large numbers of molecules. We propose a hybrid physical/graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserve total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling for the first time the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package EspalomaCharge, this approach provides drop-in replacements for both AmberTools antechamber and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at https://github.com/choderalab/espaloma-charge.

2.
ArXiv ; 2024 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-38351937

RESUMO

This letter gives results on improving protein-ligand binding affinity predictions based on molecular dynamics simulations using machine learning potentials with a hybrid neural network potential and molecular mechanics methodology (NNP/MM). We compute relative binding free energies (RBFE) with the Alchemical Transfer Method (ATM) and validate its performance against established benchmarks and find significant enhancements compared to conventional MM force fields like GAFF2.

3.
J Chem Inf Model ; 64(5): 1481-1485, 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38376463

RESUMO

This letter gives results on improving protein-ligand binding affinity predictions based on molecular dynamics simulations using machine learning potentials with a hybrid neural network potential and molecular mechanics methodology (NNP/MM). We compute relative binding free energies with the Alchemical Transfer Method and validate its performance against established benchmarks and find significant enhancements compared with conventional MM force fields like GAFF2.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Ligantes , Termodinâmica , Proteínas/química , Ligação Proteica , Redes Neurais de Computação
5.
J Phys Chem B ; 128(1): 109-116, 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-38154096

RESUMO

Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features in simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations with only a modest increase in cost.


Assuntos
Simulação de Dinâmica Molecular , Água , Aprendizado de Máquina
6.
Elife ; 122023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38047771

RESUMO

Kinase inhibitors are successful therapeutics in the treatment of cancers and autoimmune diseases and are useful tools in biomedical research. However, the high sequence and structural conservation of the catalytic kinase domain complicate the development of selective kinase inhibitors. Inhibition of off-target kinases makes it difficult to study the mechanism of inhibitors in biological systems. Current efforts focus on the development of inhibitors with improved selectivity. Here, we present an alternative solution to this problem by combining inhibitors with divergent off-target effects. We develop a multicompound-multitarget scoring (MMS) method that combines inhibitors to maximize target inhibition and to minimize off-target inhibition. Additionally, this framework enables optimization of inhibitor combinations for multiple on-targets. Using MMS with published kinase inhibitor datasets we determine potent inhibitor combinations for target kinases with better selectivity than the most selective single inhibitor and validate the predicted effect and selectivity of inhibitor combinations using in vitro and in cellulo techniques. MMS greatly enhances selectivity in rational multitargeting applications. The MMS framework is generalizable to other non-kinase biological targets where compound selectivity is a challenge and diverse compound libraries are available.


Assuntos
Antineoplásicos , Neoplasias , Humanos , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/química , Antineoplásicos/uso terapêutico , Fosfotransferases , Domínio Catalítico , Neoplasias/tratamento farmacológico
7.
Science ; 382(6671): eabo7201, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37943932

RESUMO

We report the results of the COVID Moonshot, a fully open-science, crowdsourced, and structure-enabled drug discovery campaign targeting the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease. We discovered a noncovalent, nonpeptidic inhibitor scaffold with lead-like properties that is differentiated from current main protease inhibitors. Our approach leveraged crowdsourcing, machine learning, exascale molecular simulations, and high-throughput structural biology and chemistry. We generated a detailed map of the structural plasticity of the SARS-CoV-2 main protease, extensive structure-activity relationships for multiple chemotypes, and a wealth of biochemical activity data. All compound designs (>18,000 designs), crystallographic data (>490 ligand-bound x-ray structures), assay data (>10,000 measurements), and synthesized molecules (>2400 compounds) for this campaign were shared rapidly and openly, creating a rich, open, and intellectual property-free knowledge base for future anticoronavirus drug discovery.


Assuntos
Tratamento Farmacológico da COVID-19 , Proteases 3C de Coronavírus , Inibidores de Protease de Coronavírus , Descoberta de Drogas , SARS-CoV-2 , Humanos , Proteases 3C de Coronavírus/antagonistas & inibidores , Proteases 3C de Coronavírus/química , Simulação de Acoplamento Molecular , Inibidores de Protease de Coronavírus/síntese química , Inibidores de Protease de Coronavírus/química , Inibidores de Protease de Coronavírus/farmacologia , Relação Estrutura-Atividade , Cristalografia por Raios X
8.
ArXiv ; 2023 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-37986730

RESUMO

Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.

9.
bioRxiv ; 2023 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-37745489

RESUMO

In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches, but is fundamentally limited by the accuracy with which protein:ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase:inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures co-crystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the co-crystallized ligand-utilizing shape overlap with or without maximum common substructure matching-are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance to generate a low RMSD docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar co-crystallized ligands according to shape and electrostatics proofed to be the most efficient way to reproduce binding poses achieving a success rate of 66.9 % across all included systems. The studied docking and pose selection strategies-which utilize the OpenEye Toolkit-were implemented into pipelines of the KinoML framework allowing automated and reliable protein:ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe the general findings can also be transferred to other protein families.

10.
J Chem Inf Model ; 63(18): 5701-5708, 2023 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-37694852

RESUMO

Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared with traditional molecular mechanics. To tackle this issue, we introduce an optimized implementation of the hybrid method (NNP/MM), which combines a neural network potential (NNP) and molecular mechanics (MM). This approach models a portion of the system, such as a small molecule, using NNP while employing MM for the remaining system to boost efficiency. By conducting molecular dynamics (MD) simulations on various protein-ligand complexes and metadynamics (MTD) simulations on a ligand, we showcase the capabilities of our implementation of NNP/MM. It has enabled us to increase the simulation speed by ∼5 times and achieve a combined sampling of 1 µs for each complex, marking the longest simulations ever reported for this class of simulations.


Assuntos
Simulação de Dinâmica Molecular , Redes Neurais de Computação , Ligantes , Aprendizado de Máquina
11.
J Chem Theory Comput ; 19(15): 4863-4882, 2023 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-37450482

RESUMO

Relative alchemical binding free energy calculations are routinely used in drug discovery projects to optimize the affinity of small molecules for their drug targets. Alchemical methods can also be used to estimate the impact of amino acid mutations on protein:protein binding affinities, but these calculations can involve sampling challenges due to the complex networks of protein and water interactions frequently present in protein:protein interfaces. We investigate these challenges by extending a graphics processing unit (GPU)-accelerated open-source relative free energy calculation package (Perses) to predict the impact of amino acid mutations on protein:protein binding. Using the well-characterized model system barnase:barstar, we describe analyses for identifying and characterizing sampling problems in protein:protein relative free energy calculations. We find that mutations with sampling problems often involve charge-changes, and inadequate sampling can be attributed to slow degrees of freedom that are mutation-specific. We also explore the accuracy and efficiency of current state-of-the-art approaches─alchemical replica exchange and alchemical replica exchange with solute tempering─for overcoming relevant sampling problems. By employing sufficiently long simulations, we achieve accurate predictions (RMSE 1.61, 95% CI: [1.12, 2.11] kcal/mol), with 86% of estimates within 1 kcal/mol of the experimentally determined relative binding free energies and 100% of predictions correctly classifying the sign of the changes in binding free energies. Ultimately, we provide a model workflow for applying protein mutation free energy calculations to protein:protein complexes, and importantly, catalog the sampling challenges associated with these types of alchemical transformations. Our free open-source package (Perses) is based on OpenMM and is available at https://github.com/choderalab/perses.


Assuntos
Aminoácidos , Simulação de Dinâmica Molecular , Termodinâmica , Entropia , Ligação Proteica
12.
J Chem Theory Comput ; 19(11): 3251-3275, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37167319

RESUMO

We introduce the Open Force Field (OpenFF) 2.0.0 small molecule force field for drug-like molecules, code-named Sage, which builds upon our previous iteration, Parsley. OpenFF force fields are based on direct chemical perception, which generalizes easily to highly diverse sets of chemistries based on substructure queries. Like the previous OpenFF iterations, the Sage generation of OpenFF force fields was validated in protein-ligand simulations to be compatible with AMBER biopolymer force fields. In this work, we detail the methodology used to develop this force field, as well as the innovations and improvements introduced since the release of Parsley 1.0.0. One particularly significant feature of Sage is a set of improved Lennard-Jones (LJ) parameters retrained against condensed phase mixture data, the first refit of LJ parameters in the OpenFF small molecule force field line. Sage also includes valence parameters refit to a larger database of quantum chemical calculations than previous versions, as well as improvements in how this fitting is performed. Force field benchmarks show improvements in general metrics of performance against quantum chemistry reference data such as root-mean-square deviations (RMSD) of optimized conformer geometries, torsion fingerprint deviations (TFD), and improved relative conformer energetics (ΔΔE). We present a variety of benchmarks for these metrics against our previous force fields as well as in some cases other small molecule force fields. Sage also demonstrates improved performance in estimating physical properties, including comparison against experimental data from various thermodynamic databases for small molecule properties such as ΔHmix, ρ(x), ΔGsolv, and ΔGtrans. Additionally, we benchmarked against protein-ligand binding free energies (ΔGbind), where Sage yields results statistically similar to previous force fields. All the data is made publicly available along with complete details on how to reproduce the training results at https://github.com/openforcefield/openff-sage.


Assuntos
Benchmarking , Proteínas , Ligantes , Proteínas/química , Termodinâmica , Entropia
13.
Proc Natl Acad Sci U S A ; 120(11): e2214168120, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-36877844

RESUMO

A common challenge in drug design pertains to finding chemical modifications to a ligand that increases its affinity to the target protein. An underutilized advance is the increase in structural biology throughput, which has progressed from an artisanal endeavor to a monthly throughput of hundreds of different ligands against a protein in modern synchrotrons. However, the missing piece is a framework that turns high-throughput crystallography data into predictive models for ligand design. Here, we designed a simple machine learning approach that predicts protein-ligand affinity from experimental structures of diverse ligands against a single protein paired with biochemical measurements. Our key insight is using physics-based energy descriptors to represent protein-ligand complexes and a learning-to-rank approach that infers the relevant differences between binding modes. We ran a high-throughput crystallography campaign against the SARS-CoV-2 main protease (MPro), obtaining parallel measurements of over 200 protein-ligand complexes and their binding activities. This allows us to design one-step library syntheses which improved the potency of two distinct micromolar hits by over 10-fold, arriving at a noncovalent and nonpeptidomimetic inhibitor with 120 nM antiviral efficacy. Crucially, our approach successfully extends ligands to unexplored regions of the binding pocket, executing large and fruitful moves in chemical space with simple chemistry.


Assuntos
COVID-19 , Humanos , Ligantes , SARS-CoV-2 , Antivirais , Biologia
14.
Nature ; 615(7954): 913-919, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36922589

RESUMO

Chromatin-binding proteins are critical regulators of cell state in haematopoiesis1,2. Acute leukaemias driven by rearrangement of the mixed lineage leukaemia 1 gene (KMT2Ar) or mutation of the nucleophosmin gene (NPM1) require the chromatin adapter protein menin, encoded by the MEN1 gene, to sustain aberrant leukaemogenic gene expression programs3-5. In a phase 1 first-in-human clinical trial, the menin inhibitor revumenib, which is designed to disrupt the menin-MLL1 interaction, induced clinical responses in patients with leukaemia with KMT2Ar or mutated NPM1 (ref. 6). Here we identified somatic mutations in MEN1 at the revumenib-menin interface in patients with acquired resistance to menin inhibition. Consistent with the genetic data in patients, inhibitor-menin interface mutations represent a conserved mechanism of therapeutic resistance in xenograft models and in an unbiased base-editor screen. These mutants attenuate drug-target binding by generating structural perturbations that impact small-molecule binding but not the interaction with the natural ligand MLL1, and prevent inhibitor-induced eviction of menin and MLL1 from chromatin. To our knowledge, this study is the first to demonstrate that a chromatin-targeting therapeutic drug exerts sufficient selection pressure in patients to drive the evolution of escape mutants that lead to sustained chromatin occupancy, suggesting a common mechanism of therapeutic resistance.


Assuntos
Resistencia a Medicamentos Antineoplásicos , Leucemia , Mutação , Proteínas Proto-Oncogênicas , Animais , Humanos , Antineoplásicos/química , Antineoplásicos/metabolismo , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Sítios de Ligação/efeitos dos fármacos , Sítios de Ligação/genética , Cromatina/genética , Cromatina/metabolismo , Resistencia a Medicamentos Antineoplásicos/genética , Leucemia/tratamento farmacológico , Leucemia/genética , Leucemia/metabolismo , Ligação Proteica/efeitos dos fármacos , Proteínas Proto-Oncogênicas/antagonistas & inibidores , Proteínas Proto-Oncogênicas/química , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo
15.
bioRxiv ; 2023 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-36945557

RESUMO

Relative alchemical binding free energy calculations are routinely used in drug discovery projects to optimize the affinity of small molecules for their drug targets. Alchemical methods can also be used to estimate the impact of amino acid mutations on protein:protein binding affinities, but these calculations can involve sampling challenges due to the complex networks of protein and water interactions frequently present in protein:protein interfaces. We investigate these challenges by extending a GPU-accelerated open-source relative free energy calculation package (Perses) to predict the impact of amino acid mutations on protein:protein binding. Using the well-characterized model system barnase:barstar, we describe analyses for identifying and characterizing sampling problems in protein:protein relative free energy calculations. We find that mutations with sampling problems often involve charge-changes, and inadequate sampling can be attributed to slow degrees of freedom that are mutation-specific. We also explore the accuracy and efficiency of current state-of-the-art approaches-alchemical replica exchange and alchemical replica exchange with solute tempering-for overcoming relevant sampling problems. By employing sufficiently long simulations, we achieve accurate predictions (RMSE 1.61, 95% CI: [1.12, 2.11] kcal/mol), with 86% of estimates within 1 kcal/mol of the experimentally-determined relative binding free energies and 100% of predictions correctly classifying the sign of the changes in binding free energies. Ultimately, we provide a model workflow for applying protein mutation free energy calculations to protein:protein complexes, and importantly, catalog the sampling challenges associated with these types of alchemical transformations. Our free open-source package (Perses) is based on OpenMM and available at https://github.com/choderalab/perses .

16.
Sci Data ; 10(1): 11, 2023 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-36599873

RESUMO

Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.

17.
bioRxiv ; 2023 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-36711619

RESUMO

Kinase inhibitors are successful therapeutics in the treatment of cancers and autoimmune diseases and are useful tools in biomedical research. The high sequence and structural conservation of the catalytic kinase domain complicates the development of specific kinase inhibitors. As a consequence, most kinase inhibitors also inhibit off-target kinases which complicates the interpretation of phenotypic responses. Additionally, inhibition of off-targets may cause toxicity in patients. Therefore, highly selective kinase inhibition is a major goal in both biomedical research and clinical practice. Currently, efforts to improve selective kinase inhibition are dominated by the development of new kinase inhibitors. Here, we present an alternative solution to this problem by combining inhibitors with divergent off-target activities. We have developed a multicompound-multitarget scoring (MMS) method framework that combines inhibitors to maximize target inhibition and to minimize off-target inhibition. Additionally, this framework enables rational polypharmacology by allowing optimization of inhibitor combinations against multiple selected on-targets and off-targets. Using MMS with previously published chemogenomic kinase inhibitor datasets we determine inhibitor combinations that achieve potent activity against a target kinase and that are more selective than the most selective single inhibitor against that target. We validate the calculated effect and selectivity of a combination of inhibitors using the in cellulo NanoBRET assay. The MMS framework is generalizable to other pharmacological targets where compound specificity is a challenge and diverse compound libraries are available.

18.
J Chem Inf Model ; 62(22): 5622-5633, 2022 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-36351167

RESUMO

The development of accurate transferable force fields is key to realizing the full potential of atomistic modeling in the study of biological processes such as protein-ligand binding for drug discovery. State-of-the-art transferable force fields, such as those produced by the Open Force Field Initiative, use modern software engineering and automation techniques to yield accuracy improvements. However, force field torsion parameters, which must account for many stereoelectronic and steric effects, are considered to be less transferable than other force field parameters and are therefore often targets for bespoke parametrization. Here, we present the Open Force Field QCSubmit and BespokeFit software packages that, when combined, facilitate the fitting of torsion parameters to quantum mechanical reference data at scale. We demonstrate the use of QCSubmit for simplifying the process of creating and archiving large numbers of quantum chemical calculations, by generating a dataset of 671 torsion scans for druglike fragments. We use BespokeFit to derive individual torsion parameters for each of these molecules, thereby reducing the root-mean-square error in the potential energy surface from 1.1 kcal/mol, using the original transferable force field, to 0.4 kcal/mol using the bespoke version. Furthermore, we employ the bespoke force fields to compute the relative binding free energies of a congeneric series of inhibitors of the TYK2 protein, and demonstrate further improvements in accuracy, compared to the base force field (MUE reduced from 0.560.390.77 to 0.420.280.59 kcal/mol and R2 correlation improved from 0.720.350.87 to 0.930.840.97).


Assuntos
Proteínas , Software , Ligantes , Proteínas/química , Entropia , Ligação Proteica
19.
Artigo em Inglês | MEDLINE | ID: mdl-36382113

RESUMO

Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. These assessments require construction of a benchmark-a set of well-prepared, high quality systems with corresponding experimental measurements designed to ensure the resulting calculations provide a realistic assessment of expected performance when these methods are deployed within their domains of applicability. To date, the community has not yet adopted a common standardized benchmark, and existing benchmark reports suffer from a myriad of issues, including poor data quality, limited statistical power, and statistically deficient analyses, all of which can conspire to produce benchmarks that are poorly predictive of real-world performance. Here, we address these issues by presenting guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields. We highlight challenges and open questions that remain to be solved in these areas, as well as recommendations for the collection of new datasets that might optimally serve to measure progress as methods become systematically more reliable. Finally, we provide a curated, versioned, open, standardized benchmark set adherent to these standards (PLBenchmarks) and an open source toolkit for implementing standardized best practices assessments (arsenic) for the community to use as a standardized assessment tool. While our main focus is free energy methods based on molecular simulations, these guidelines should prove useful for assessment of the rapidly growing field of machine learning methods for affinity prediction as well.

20.
Chem Sci ; 13(41): 12016-12033, 2022 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-36349096

RESUMO

Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules (atom types) for applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process-spanning chemical perception to parameter assignment-is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field ("Parsley") openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...